Neural Morphological Tagging from Characters for Morphologically Rich Languages
نویسندگان
چکیده
This paper investigates neural characterbased morphological tagging for languages with complex morphology and large tag sets. We systematically explore a variety of neural architectures (DNN, CNN, CNNHighway, LSTM, BLSTM) to obtain character-based word vectors combined with bidirectional LSTMs to model across-word context in an end-to-end setting. We explore supplementary use of word-based vectors trained on large amounts of unlabeled data. Our experiments for morphological tagging suggest that for ”simple” model configurations, the choice of the network architecture (CNN vs. CNNHighway vs. LSTM vs. BLSTM) or the augmentation with pre-trained word embeddings can be important and clearly impact the accuracy. Increasing the model capacity by adding depth, for example, and carefully optimizing the neural networks can lead to substantial improvements, and the differences in accuracy (but not training time) become much smaller or even negligible. Overall, our best morphological taggers for German and Czech outperform the best results reported in the literature by a large margin.
منابع مشابه
Bidirectional Long Short-Term Memory Network with a Conditional Random Field Layer for Uyghur Part-Of-Speech Tagging
Uyghur is an agglutinative and a morphologically rich language; natural language processing tasks in Uyghur can be a challenge. Word morphology is important in Uyghur part-of-speech (POS) tagging. However, POS tagging performance suffers from error propagation of morphological analyzers. To address this problem, we propose a few models for POS tagging: conditional random fields (CRF), long shor...
متن کاملVerbs are where all the action lies: Experiences of Shallow Parsing of a Morphologically Rich Language
Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich languageMarathi1. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a C...
متن کاملCharacter Word Embedding for NLP tasks in Indian Languages
In the recent time Word Embeddings have been used as unsupervised approach to achieve results comparable to that of supervised methods which use handcrafted features.But information about word morphology and shape is normally ignored when learning word representations.Character level embedding can capture the intra-word information specially when dealing with morphologically rich languages. Her...
متن کاملMorphological Analysis for Statistical Machine Translation
We present a novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities. The technique pre-supposes fine-grained segmentation of a word in the morphologically rich language into the sequence of prefix(es)-stem-suffix(es) and part-of-speech...
متن کاملmagyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian
Hungarian is the stereotype of morphologically rich and free word order languages. Here, we introduce magyarlanc, a natural language toolkit developed for the linguistic preprocessing – segmentation, morphological analysis, POS-tagging and dependency parsing – of Hungarian texts. We hope that the free availability of the toolkit fosters the research not just on the Hungarian language but on all...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1606.06640 شماره
صفحات -
تاریخ انتشار 2016